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Abstract: This article investigates nonparametric estimation of variance functions 
for functional data when the mean function is unknown. We obtain asymptotic 
results for the kernel estimator based on squared residuals. Similar to the finite 
dimensional case, our asymptotic result shows the smoothness of the unknown 
mean function has an effect on the rate of convergence. Our simulaton studies 
demonstrate that estimator based on residuals performs much better than that 
based on conditional second moment of the responses. 
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1 Introduction 

Recently, there has been increased interest in the statistical modelhng of func- 
tional data. In many experiments, functional data appear as the basic unit of 
observations. As a natural extension of the multivariate data analysis, functional 
data analysis provides valuable insights into these problems. Compared with the 
discrete multivariate analysis, functional analysis takes into account the smooth- 



ness of the high dimensional covariates, and often suggests new approaches to 
the problems that have not been discovered before. Even for nonfunctional data, 
the functional approach can often offer new perspectives on the old problem. 

The literature contains an impressive range of functional analysis tools for 
various problems including exploratory functional principal component analy- 
sis, canonical correlation analysis, classification and regression. Two major ap- 
proaches e xist. The more tradit i onal approach, carefully documented in the 
monograph iRamsav &: SilvermanI (|2005l ) , typically starts by representing func- 



tional data by an expansion with respect to a certain basis, and subsequent 
inferences are carried out on the coefficients. The most commonly utilized basis 
include B-spline basis for nonperiodic dat a and Fourier basis for periodic data. 
Another line of work by the French school iFerratv &i Vieul (|2002l ) , taking a non- 



parametric point of view, extends the traditional nonparametric techniques, most 
notably the kernel estimate, to the functional case. Some theoretical results are 
also obtained as a generalization of the convergence properties of the classical 
kernel estimate. 

The functional nonparametric regression model, introduced in 



Ferratv &: Vieu 



(12003), is defined as 



(1) 



where we emphasized the heterogeneity of the regression model which is the focus 
of this article. We assume that e^'s are random variables with E{ei\Xi) = and 
Var{€i\Xi) = v{Xi). The covariates Xi are assumed to belong to some semi- 
metric vectorial space TC determined by the semi-r netric d(., .). Un l ike rn a ny pre- 



vious n onparametr ic functional regression studies iFerratv &: Vieui ((20041 ) ; 



(120051 ): 



Masrv 



LianI (j2007l ) which focused on estimating the mean function m, here we 



are interested in estimating v when m is unknown, and thus the mean function 
only plays the role of a nuisance parameter. 

Variance function estimation has received much attention since the 1980's 
whe n it was required for c o nfiden ce interval construction for the mean function, 



and 



Muller &: Stadtmullerl (|l987l ) discussed some utility of it in obtaining more 



efficient estimators of the mean function. There are t w 'o ma i n app roaches to 



variance function estimation. In 



Muller k Stadtmuller (1987 



1993), the vari- 



ance fun ction was estimat e d dir e ctly from loca ^ 



recently. 



Brown k Levind (12007|); 



Wang et al 



contrasts of the responses. More 



(120081) ha s obta, ined minimax con- 



Cai et al. 



(120091 ') further extended 



vergence rates based on local difference and 
this to multivariate regression. These asymptotic theory were developed based 
on fixed covariates on a grid and it is not straightforward to extend to the case 
with random covariates. For our functional data analysis, it is not clear how 
to define a grid on t 



le semi-metric space Ti. A different direction was taken in 
Hall k Carrolll (|1989| ). where the variance function was estimated by a weighted 
smoothing of squared residuals after a fit for the mean function was obtained. 
This approach was also considered in 



Fan k Yad (|l998l ) using local polynomial 



regression. 



inally , we mention the adaptive estimation of variance function in 



Cai k Wana (j2008l ) by thresholding of wavelet coefficients. 

In the following sections, we adapt the idea of variance estimation in non- 
parametric regression based on squared residuals to the functional setting. In 
Section 2, we review the functional nonparametric regression model in a semi- 
metric functional vectorial space. Then we introduce functional nonparametric 
variance estimation in this general setting and describe the asymptotic results for 
our kernel-type estimator. We also discuss the effect of unknown mean function 



3 



on the variance estimator and relate it to the finite-dimensional case. In Section 
3, we carry out a simulation study to demonstrate that the residual-based es- 
timator is more efficient than the estimator based on nonparametric regression 
on the squared responses. Finally, we illustrate the approach on the popular 
spectrometric data for predicting the fat content. The technical proofs for our 
asymptotic results are deferred to the appendix. 



2 Nonparametric Functional Variance Esti- 
mation 

In the functional non parametric regression model ([T|) presented originally in 



Ferratv &: Vieul (|2002l ). the mean function is estimated by a kernel-type esti- 

rh{x) 



mator 

E'l=lKidmix,Xi)/hm)Yi 



YA=lK{dm{x,Xi)/hm) 

where Yi is the real- valued responses and hm is the bandwidth used for estimating 
the mean function. Note that we use dm to denote the semi-metric for mean 
function estimation as we will use a different semi-metric for variance function 
estimation. Denote R{X,Y) = (Y — m{X))'^ . Since under model ([1]), we have 
E{R{X,Y)\X) = v{X), a natural kernel-type estimator for v{x) (when the mean 
function is known) is 

^ YJ^^^K{d^{x,Xi)/K)Ri 
' Y.l=iK{dv{x,Xi)/K) ' 

where R-i = (Yi — m(Xj))^ and hy is the chosen bandwidth of the kernel. Note 
that the semi-metric d^ used for estimating the variance function is in general 
different from the semi-metric dm used in estimating the mean function. Using 



different semi-metrics is important in some cases as demonstrated in our experi- 
ment witli spectrometric data later. Altliougli we could use different kernels for 
the mean and variance functions, we choose to use the same kernel here mainly 
for notational simplicity. 

In practice, the mean function m(-) is typically unknown and a natural ap- 
proach is to replace m by the nonparametric estimator rh. Equivalently, we 
replace Ri by Ri = {Yi - m{Xi)Y in Q. 

Although only independent data are considered in our simulations and real 
data application, for our asymptotic analysis, we will present our results in a more 
general context by considering a strongly mixing sequence {(Xj, y^), i = 1, . . . , n}. 
Our asymptotic result is state d for a fixed x £ 7i. 



Following the notations in 

Ar = 



Ferratv k. Vieul (j2006l l. we have 



m ) 
r 

n 



/n 



i=l 



^2 



i=l 



K{d,{x,Xi)/K) 
EK{d,{x,Xi)/K) 

n 



In 



i=l 
n 



Y,{yi-Hx^)?^Vn 



i=l 



SO that m{x) = r^^ /r^ and v{x) = rl^/r^. For notational simplicity, in the rest of 
the article, we denote rrii = m{Xi),rhi = m{Xi),Vi = v{Xi),Vi = v{Xi). We also 
set Wij = K{dm{Xi,Xj)/hm)/Yjk^idm{Xi,Xk)/hm) so that rhi = YjjWijYj. 
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Similar to iFerratv &: Vieul (|2004l . 120061 ). the rate of convergence of v{x) will 



critically depend on the quantities and defined by 

— maviQ'" s*" 1. 

— ^^^\^n,l^ ^n,2^ ^n,3^ ^n,iS 



n.l 



'n,2 



'n,3 



'n,4 



'n,3 



'n,4 



5^5]|CoKAr,A-)| (3) 

i=i j=i 

n n 

^j;i^|CoKAre.,A-e,|Xr)| (4) 

i=i j=i 

n n 

5]j;|CoKArrn„A>,-)l (5) 

n n 

n|i?^J]A>,,e,e,| (6) 

i=i j=i 

n n 

Y,Y.\^ov{^^,/^])\ (7) 

i=l j=l 

n n 

5;j;i?|Co^(Ar6„AJe,|Xr)| (8) 

i=i j=i 

n n 

j;j;|CoKA>„A^^t;,)l (9) 

i=i j=i 

n 

E\Cov{/\'^,w,,€iej,/\lwkiekei\X^)\ (10) 



where in some of the expressions above, the covariances are conditioned on ob- 
served covaria tes X" = , . . . , ) . 



Ferratv &: Vieul (|2006l ) and impose the following condition on the 



We follow 
kernel function 

K is supported on [0, 1], bounded and bounded away from zero on [0, 1]. (11) 
As the case for mean function estimation, we need the following regularity 
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conditions 



m{xi) - m{x2)\ < Cdm{xi,X2Y, \v{xi) - v{x2)\ < Cdv{xi,X2)^ ,a > 0,/3 > 0. 

(12) 



In 



Ferratv k Vieul ( 2004 



20061 ) ■ moment conditions are directly assumed on the 
response Y. We figure that it is more natural to impose the moment condition 
on the error 

3p>4,E\e\P<oo. (13) 

For uniform convergence over a compact neighborhood C of 7i containing x 
for the mean function, which is needed in the proof below, we assume that C can 
be written as, for any / > 0, 



C = Y^ B{tk, I), with tI" = C for some a > 0, C > 0. 



(14) 



k=l 



Ferratv Vieul (|2008l ). and inter 



This condition is exactly the same as that in 
ested readers can find some related discussions there. 
Now we are ready to state our main result: 

Theorem 1 Under the conditions ill\ )-[l4^, for a fixed x € Ti., we have 



I"/ ^ / M n h2a , log n . ^/s^i 
\v{x) - v{x)\ = O I h^ -\ ^ h n-t; + 



n 



n 



in probability. 



Remark 1 In 



Ferratv & Vieu (MM, 



200a ). the asymptotic results are stated as 



almost complete convergence, which is stronger than convergence in probability. 
The difficulty of proving stronger convergence for our variance estimator comes 
from the appearance of U -type- statistics in the expressions in the proof, thus we 
settle with weaker type of convergence here. 
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Remark 2 In .Ferratv & Vieu \200d) . it was discussed in details how de- 
pends on the following two quantities: (j)m{h) '■= P{dm{x, X) < h) and ipmih) = 
P{dm{x — Xi) < h,dra{x,X2) < h) for strongly mixing data sequences. Those re- 
sults can he adapted for our purposes. For example, for the independent and iden- 
tically distributed data, as shown in the appendix, we have sj^ = 0{n/ (j)m{hra)) 
and s"^ = 0{n/ (j)y{hy)) with 0t,(/i) = P{d-u{x,X) < h). Thus in the i.i.d. case we 
have the following direct consequence. 

Corollary 1 Under the conditions ill\)-[T4\), assuming in addition the data 
{{Xi, Yi),i = 1, . . . , n} are i.i.d. and the bandwidths are chosen such that km — > 
0, hy 0, ncprnihm) ^ oo, ncj) 

vihv) ^ oo, we have 



\v{x) - v{x)\ = oihf^+ , +h^ + J ^"^/^ ) in probability, 

y ncpmXhm) y ncpyihy) J 

Remark 3 From the corollary, we can observe some interesting effect of un- 
known mean for variance function estimation. For simplicity and specificity, 
assume that X is of fractal orde r d with respect t o both dm and d^, i.e. (pmih) ~ 



4>v{h) ~ /i'^. It was shown in 



Ferratv & Vieu \200a) Lemma 13.6 that if li 



is a separable Hilbert space with semi-metric defined by the projection onto the 
first d elements of an orthonormal basis, then (p{h) ~ h'^. This is also true 
for d-dimensional regression (i.e., TC = R^). With km ~ (logn/n)^^^'^'^~^'^\hy ~ 
{logn/n)^^^'^^~^'^\ we obtain the rate of convergence max{{\ogn/n)'^°'^^'^°''^'^\ (logn/n)^^^'^^'^'^^} . 
If 2a/ {2a + d) > P/{2f3 + d), the rate becomes (log n/n)^/(2^+"') . This rate is 
the same as the rate obtained when the mean function m{.) is known. Thus we 
observe that when the mean function is smooth enough, it has no effect on vari- 
ance function estimation, while its effect cannot be ignored for less smooth mean 
functions. In particular, it can be easily verified that 2a/ {2a + d) > (3/ {2/3 + d) 
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is true as soon as a > d/2. This results is the same as what was observed in 



Hall & Carrol 



198m) for one- dimensional regression where the author observed 



that the mean has no effect on var iance function e stima tion as long as a > 1/2 



(the last sentence in section 2.2 of 



Hall & Carrol 



19891 )). 



Remark 4 The simple relationship v{x) = E{Y'^\X = x) — (E{Y\X = x))^ mo- 
tivates the direct estimator based on estimating conditional expectation of squared 
responses and setting v(x) = s{x) — where s{x) is the nonparametric 

kernel-type es t imate of E{Y^\X = x). This estimator is briefly mentioned in 



Ferraty et al. 



1 2001 ). It can be shown that this estimator has the same con 



vergence rate as above. However, in one- dimensional case, .Fan & Yac 



199t 



pointed out the direct method can create a very large bias. The intuitive expla- 
nation provided for the large bias is that the direct estimator is obtained when 
replacing Ri = (Yi — 'm(Xj))^ in the residual-based method by {Yi — This 
explanation also applies to our functional context. In our simulation study to 
be presented next, it is clear that the performance of the direct method is much 
worse than the residual based method. 



3 Experiments 
3.1 Simulation Study 

We now consider in this section the finite sample performance of our variance 
estimator and also compare the results with the direct squared responses based 
method. We use three examples with different mean and variance functions to 
illustrate their performances. For each example, 100 simulations are performed 
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with n = 200 data points generated in each simulation. In ah three examples, 
Xi is a random function supported on [—1, 1]. 
For the first example, we set 

m{x) = 0,v{x)= / \ cos x{t)\dt, 



and the Xi's are generated as realizations of Brownian Motion starting at time 
t = — 1 with random start point x{—l) distributed as uniform random variables 
on [—1, 1]. For the second example, we have 

m{x) = J tx{t) dt,v{x) = J \t\x^{t)dt, 

and the Xi's are g enerated the s a me w ay as in the first example. For the third 



example, we follow 



Ferraty et al. 



([20071) and set 



m{x) = J \x'{t)\{l - cos{TTt))dt,v{x) = j \x' {t)\{l + cos{T:t))dt. 

The random curves in this example are simulated from 

X{t) = sm{ujt) + (a + 27r)t + 6, w ~ Unif{0, 27r), o, b ~ Unif{0, 1). 

The simulations are performed in R with the publicly available npfda package 
(|http : //www. Isp . ups-tlse . f r/ staph/npf da/p. The default quadratic kernel 
is used in the implementation. The bandwidths hm and are chosen using 
cross-validation. The choice of semi-metric is in general a difficult problem. 
In our current simulations, their choices are suggested by our knowledge of 
the true mean and variance functions. Thus for the first two examples, we 
use dm{xi,X2) = dv{xi,X2) = j\{xi{t) — X2{t))'^dt and we use dm{xi,X2) = 
d^{xi,X2) = j\{xi{t) — x'2{t))'^dt for the third example. Our simulation also 
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Table 1: Simulation results (MSE) for comparing two variance function 
estimators. 



Estimators 


Example 1 


Example 2 


Example 3 


residual based method 


0.10 


0.27 


4.37 


direct method 


0.10 


0.38 


19.24 



shows that these choices of semi-metrics are the best among semi-metrics based 
on different orders of derivatives (results not represented here). For evaluation 
of performance, we adopt the discrete mean squared error 

n 

MSE = -y^{v{Xi)-v{Xi)f. 

i=l 

We report in Table 1 the median MSE for variance function estimators based 
on 100 simulations. It is easily seen from the table that and the residual based 
two-step method performs much better than the direct method in terms of MSE, 
except in the first example with constant mean function, which is as expected. 



3.2 Illustration with Chemometric Data 

We illustrate our approach on the real chemometric dataset, which contains 215 
spectra of light absorbance for meat samples as functions of the wavelengths. 
Because of the denseness of wavelengths at which the measurements are made, 
the subjects are naturally treated as continuous curves. This dataset has been 
previously used in nonparametric regression studies where the covariate is the 
spetr a curve and the response is t he percentage of fat content in the piece of 



meat 



Ferratv fc Vieu (2002 



2009 1: 



Ferratv et al. 



(120071). We will estimate the 



variance function for this regression problem. Previous study suggested that for 
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mean function estimation, taking as the semi-metric the L2 distance between the 
second derivatives of the spetra gives favorable result, thus this semi-metric is 
used for mean function estimation. As in previous studies, we train on the first 
150 spectra and use the rest as validation. We examine the estimation accuracy of 
the variance function for semi-metrics defined as L2 distance between the curves 
using different orders of derivatives, measured as mean squared error 

. 215 
i=151 

and find that using L2 distance between 1st derivatives gives the best result. The 
estimated variance function value and squared residuals for the validation data 
are shown in Fig. [H giving a MSE of 33.18. Heterogeneity of the problem are 
clearly seen from the figure. 

4 Conclusion 

In this article, we study the problem of nonparametrically estimating variance 
function in functional data analysis. We derived the asymptotic property for 
the squared residuals based estimator and its superiority to the direct squared 
responses based method is demonstrated through simulations. Our asymptotic 
result shows an interesting interaction between the smoothness of the mean func- 
tion and that of the variance function. Finally, we show there exists clear het- 
erogeneity in the regression problem for the chemometric data as an illustration. 
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Cond. Variance: MSE= 33.18 




Figure 1: Estimated variance function vs. squared residuals on validation 
data. 
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Appendix 

First, we make the remark that under condition (jl2p . we can assume m{x) < 
and v{x) < M^, that is the mean and variance functions are bounded, without 
loss of generahty. The reason is that we always consider only values of both 
functions inside a compact neighborhood of the fixed x. As an illustration, in 
the definition of the estimator v{x), > only when d^(x,Xi) < hy, so the 
sum over i is only for all Xj's contained in a neighborhood of x. 

To make the presentation clear, we first state the asymptotics for mean 
function estimation in a Lemma. Note all asymptotic orders obtained below are 
in the sense of convergence in probability. 



Lemma 1 Under conditions ill]) - [13] ) . we have 



Ir'Pix) - 1| = 0{^/s^^ognJn^) (15) 
\r^ix)-mix)\ = 0(C + v's^logn/nS) (16) 



\E{r^/r'^)-m{x)\ = Oih^ + y^s^ log n/n'^) (17) 
\r^/r^-E{r^/r^)\ = Oih^ + y^s^^ log n/n^). (18) 

If in addition, condition ( |j^| ) is satisfied, the above convergence is uniform over 
a compact neighborhood of x inTC. 



roof: The proofs of (|15p and (jl6p are similar to that contained in 



( 2004 



Ferratv Sz Vieu 



20061 ). On one hand, the proof is simplified by the observation that we only 
require convergence in probability. On the other hand, the fact that we impose 
conditions directly on the errors instead of the responses make the proof slightly 
more complicated. Equations ()17p and (jlSp are direct consequences of the first 
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two eq uations and the last statement of the lemma follows from iFerratv &: Vieu 



(12003). We only show 1^ below. 



The bias |Sr™(x) — m(x)\ = 0{h'^) is shown exactly as in 



Ferratv &: Vieu 



(|2004l ). For variance calculation, we have 



VarirlT - E{rir\X^)) = E[Var{rT - E{rT\X^)\X^)] 

= E[Var{-y2ATvie,\X^)] 

i 

1 

Similarly, Var{E{rJp\Xf)—ErJp) = 0{s'^^/n'^) using equation ([5]). Since yar(r™) 
Var{r'^ - E{r^\X'^)) + Var{E{rl^\X'^) - Er^) = 0(s;^/n2), 1^ follows from 
the Markov inequality. □ 

Proof of Theorem [1} 

Using the decomposition 

Ri = {Yi-rhi)'^ = Vi+2^/Fi{mi-mi)ei + {mi-mi)'^+Vi{ei-l) =: Ai+Bi+d+Di. 
and similar to the proof of Lemma (H we have 



\rl-l\=Oi^sl,logn/n^), (19) 
and we only need to show that 

= ^liA +Bi + Ci + Di)/n = 0{h^, + v's-logn/n2 + C + C log n/n^). 

i 

Using conditions ()lip - (ll3p . we have 



^UAi + A)/n - vix) = 0{hZ + ^«,2 + <,3) log (20) 
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following the same steps as the proof of (jl6p . Also, 



n 



< sup(mi - rhifrl = 0{h^^ + s^logn/n^), 



(21) 



where the supremum over i obeys the same rate as for a fixed x because we can 
take only i such that (it,(x,Xj) < /i^,, which is contained in any fixed compact 
neighborhood of x and note the final statement of Lemma [TJ 

Finally, the term "^-AyBi/n is dealt with in Lemma [5J The theorem is 
proved combining the following lemma with (|19p . (j2U|) and (j2ip . 



Lemma 2 In the context of TheoremUl we have A'^Bi/n = OL (sj^ 2 + ■^n 4) n/n'^+ 





Proof: Writing 




We have E{F) 



E{F\X^) = and 



Var{F\X^) = " Emi)imj - Emj)Cov{Ai^ieu A^^^e^lXi") 
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Also, for the second term in (j22p . we have E[G) = and 



« i 3 



Thus Far(G) = ^(yar(G|Xf)) = 0«_2/^^) and G = s^^'^ogn/n^). □ 
Finahy, for the third term H, 

= 0«4/^') 

and 

i 3 



f \Cov{A^w,,eiej,Alwkiekei\X^ 

i,j,k,l 



Thus H = 0{s'^^Jn^ + ^< 4 log n/n). 

Proof of Corollary [1} We need to show that in the i.i.d. case, = 
0{n/ (l)m{hm)) and sj^ = 0{n/ (f)v{hv)). We choose to calculate s™]^, s5^4 and 5^ 4, 
the calculations are similar for the others. 

In the i.i.d. case, we have 



Thus s^^i = nEAl = 0{n/(l)m{hm)) by Lemma 4.3 of 



Ferratv fc Vieul ((20061) 
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For sj^4, we have 



i 



(hm)' 



where we used the fact wu = K(0)/ Y- Kjh^^dmiXi Xj)) = 0{{n(j)m{hrn)) ^ 



obtained from (1151) and Lemma 4.3 of 



Ferratv fc Vieul (|2006l ) 



For 4, we have 



^ (An V. + 2 5] (A^H 
0{n/Mhv)), 



since it is assumed that n(prn{hm) — > oo. 
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